On designing pronunciation lexicons for large vocabulary, continuous speech recognition

نویسندگان

  • Lori Lamel
  • Gilles Adda
چکیده

Creation of pronunciation lexicons for speech recognition is widely acknowledged to be an important, but labor-intensive, aspect of system development. Lexicons are often manually created and make use of knowledge and expertise that is difficult to codify. In this paper we describe our American English lexicon developed primarily for the ARPA WSJ/NAB tasks. The lexicon is phonemically represented, and contains alternate pronunciations for about 10% of the words. Tools have been developed to add new lexical items, as well as to help ensure consistency of the pronunciations. Our experience in large vocabulary, continuous speech recognition is that systematic lexical design can improve system performance. Some comparative results with commonly available lexicons are given.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large vocabulary continuous speech recognition based on cross-morpheme phonetic information

In this paper, we present a novel method to regulate lexical connections among morpheme-based pronunciation lexicons for Korean large vocabulary continuous speech recognition (LVCSR) systems. A pronunciation dictionary plays an important role in subword-based LVCSR in that pronunciation variations such as coarticulation will deteriorate the performance of an LVCSR system if it is not well accou...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Automatic generation of pronunciation lexicons for Mandarin spontaneous speech

Pronunciation modeling for large vocabulary speech recognition attempts to improve recognition accuracy by identifying and modeling pronunciations that are not in the ASR systems pronunciation lexicon. Pronunciation variability in spontaneous Mandarin is studied using the newly created CASS corpus of phonetically annotated spontaneous speech. Pronunciation modeling techniques developed for Engl...

متن کامل

Cross Lingual Modelling Experiments for Indonesian

The extension of Large Vocabulary Continuous Speech Recognition (LVCSR) to resource poor languages such as Indonesian is hindered by the lack of transcribed acoustic data and appropriate pronunciation lexicons. Research has generally been directed toward establishing robust cross-lingual acoustic models, with the assumption that phonetic lexicons are readily available. This is not the case for ...

متن کامل

lex4all: A language-independent tool for building and evaluating pronunciation lexicons for small-vocabulary speech recognition

This paper describes lex4all, an opensource PC application for the generation and evaluation of pronunciation lexicons in any language. With just a few minutes of recorded audio and no expert knowledge of linguistics or speech technology, individuals or organizations seeking to create speech-driven applications in lowresource languages can build lexicons enabling the recognition of small vocabu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996